230 research outputs found
Fault-Aware Non-Collective Communication Creation and Reparation in MPI
The increasing size of HPC architectures makes the faults' presence a more
and more frequent eventuality. This issue becomes especially relevant since
MPI, the de-facto standard for inter-process communication, lacks proper fault
management functionalities. Past efforts produced extensions to the MPI
standard that enabled fault management, including ULFM. While providing
powerful tools to handle faults, it still faces limitations like the
collectiveness of the repair procedure. With this paper, we overcome those
limitations and achieve fault-aware non-collective communicator creation and
reparation. We integrate our solution into an existing fault resiliency
framework and measure the overhead introduced in the application code. The
experimental campaign shows that our solution is scalable and introduces a
limited overhead, and the non-collective reparation is a viable opportunity for
ULFM-based applications
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Legio: Fault Resiliency for Embarrassingly Parallel MPI Applications
Due to the increasing size of HPC machines, the fault presence is becoming an
eventuality that applications must face. Natively, MPI provides no support for
the execution past the detection of a fault, and this is becoming more and more
constraining. With the introduction of ULFM (User Level Fault Mitigation
library), it has been provided with a possible way to overtake a fault during
the application execution at the cost of code modifications. ULFM is intrusive
in the application and requires also a deep understanding of its recovery
procedures.
In this paper we propose Legio, a framework that lowers the complexity of
introducing resiliency in an embarrassingly parallel MPI application. By hiding
ULFM behind the MPI calls, the library is capable to expose resiliency features
to the application in a transparent manner thus removing any integration
effort. Upon fault, the failed nodes are discarded and the execution continues
only with the non-failed ones. A hierarchical implementation of the solution
has been also proposed to reduce the overhead of the repair process when
scaling towards a large number of nodes.
We evaluated our solutions on the Marconi100 cluster at CINECA, showing that
the overhead introduced by the library is negligible and it does not limit the
scalability properties of MPI. Moreover, we also integrated the solution in
real-world applications to further prove its robustness by injecting faults
Fault Awareness in the MPI 4.0 Session Model
The latest version of MPI introduces new functionalities like the Session
model, but it still lacks fault management mechanisms. Past efforts produced
tools and MPI standard extensions to manage fault presence, including ULFM.
These measures are effective against faults but do not fully support the new
additions to the standard. In this paper, we combine the fault management
possibilities of ULFM with the new Session model functionality introduced in
version 4.0 of the standard. We focus on the communicator creation procedure,
highlighting criticalities and proposing a method to circumvent them. The
experimental campaign shows that the proposed solution does not significantly
affect applications' execution time and scalability while better managing the
insurgence of faults
An Efficient Monte Carlo-based Probabilistic Time-Dependent Routing Calculation Targeting a Server-Side Car Navigation System
Incorporating speed probability distribution to the computation of the route
planning in car navigation systems guarantees more accurate and precise
responses. In this paper, we propose a novel approach for dynamically selecting
the number of samples used for the Monte Carlo simulation to solve the
Probabilistic Time-Dependent Routing (PTDR) problem, thus improving the
computation efficiency. The proposed method is used to determine in a proactive
manner the number of simulations to be done to extract the travel-time
estimation for each specific request while respecting an error threshold as
output quality level. The methodology requires a reduced effort on the
application development side. We adopted an aspect-oriented programming
language (LARA) together with a flexible dynamic autotuning library (mARGOt)
respectively to instrument the code and to take tuning decisions on the number
of samples improving the execution efficiency. Experimental results demonstrate
that the proposed adaptive approach saves a large fraction of simulations
(between 36% and 81%) with respect to a static approach while considering
different traffic situations, paths and error requirements. Given the
negligible runtime overhead of the proposed approach, it results in an
execution-time speedup between 1.5x and 5.1x. This speedup is reflected at
infrastructure-level in terms of a reduction of around 36% of the computing
resources needed to support the whole navigation pipeline
Multi-objective co-exploration of source code transformations and design space architectures for low-power embedded systems
The exploration of the architectural design space in terms of energy and performance is of mainly importance for a broad range of embedded platforms based on the System-On-Chip approach. This paper proposes a methodology for the co-exploration of the design space composed of architec-tural parameters and source program transformations. A heuristic technique based on Pareto Simulated Annealing (PSA) has been used to efficiently span the multi-objective co-design space composed of the product of the parame-ters related to the selected program transformations and the configurable architecture. The analysis of the proposed framework has been carried out for a parameterized super-scalar architecture executing a selected set of benchmarks. The reported results show the effectiveness of the proposed co-exploration with respect to the independent exploration of the transformation and architectural spaces to efficiently derive approximate Pareto curves
MPSoCs Run-Time Monitoring through Networks-on-Chip
Abstract-Networks-on-Chip (NoCs) have appeared as design strategy to overcome the limitations, in terms of scalability, efficiency, and power consumption of current buses. In this paper, we discuss the idea of using NoCs to monitor system behaviour at run-time by tracing activities at initiators and targets. Main goal of the monitoring system is to retrieve information useful for run-time optimization and resources allocation in adaptive systems. Information detected by probes embedded within NIs is sent to a central unit, in charge of collecting and elaborating the data. We detail the design of the basic blocks and analyse the overhead associated with the ASIC implementation of the monitoring system, as well as discussing implications in terms of the additional traffic generated in the NoC 1
- …